skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Laefer, D."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. ISMLG 2023 4th International Symposium on Machine Learning and Big Data in Geoscience University College Cork, 29th August – 1st September 2023 
    more » « less
  2. Abstract. Most deep learning (DL) methods that are not end-to-end use several multi-scale and multi-type hand-crafted features that make the network challenging, more computationally intensive and vulnerable to overfitting. Furthermore, reliance on empirically-based feature dimensionality reduction may lead to misclassification. In contrast, efficient feature management can reduce storage and computational complexities, builds better classifiers, and improves overall performance. Principal Component Analysis (PCA) is a well-known dimension reduction technique that has been used for feature extraction. This paper presents a two-step PCA based feature extraction algorithm that employs a variant of feature-based PointNet (Qi et al., 2017a) for point cloud classification. This paper extends the PointNet framework for use on large-scale aerial LiDAR data, and contributes by (i) developing a new feature extraction algorithm, (ii) exploring the impact of dimensionality reduction in feature extraction, and (iii) introducing a non-end-to-end PointNet variant for per point classification in point clouds. This is demonstrated on aerial laser scanning (ALS) point clouds. The algorithm successfully reduces the dimension of the feature space without sacrificing performance, as benchmarked against the original PointNet algorithm. When tested on the well-known Vaihingen data set, the proposed algorithm achieves an Overall Accuracy (OA) of 74.64% by using 9 input vectors and 14 shape features, whereas with the same 9 input vectors and only 5PCs (principal components built by the 14 shape features) it actually achieves a higher OA of 75.36% which demonstrates the effect of efficient dimensionality reduction. 
    more » « less
  3. Abstract. State-of-the-art remote sensing image management systems adopt scalable databases and employ sophisticated indexing techniques to perform window and containment queries. Many rely on space-filling curve (SFC) based index techniques designed for key-value databases and are predominantly employable for images that are iso-oriented. Critically, these indexes do not consider the high degree of overlap among images that exists in many data sets and the affiliated storage requirements. Specifically, employing an SFC-based grid cell index approach in consort with ground footprint coverage of the images requires storage of a unique image object identification (IOI) for each image in every grid cell where overlap occurs. Such an approach adversely affects both storage and query response times. In response, this paper presents an optimization technique for an SFC-based grid cell space indexing. The optimization is specifically designed for window and containment queries where the region of interest overlaps with at least a 2 × 2 grid of cells. The technique is based on four cell removal steps, thus called “four step algorithm” (4SA). Each step employs a unique spatial configuration to check for continuous spatial extent. If present, the IOI of the target cell is omitted from further consideration. Analysis and experiments on real world and synthetic image data demonstrated that 4SA improved storage demands by 41.3% – 47.8%. Furthermore, in the performed querying experiments, only 42% of IOI elements needed to be processed, thus yielding a 58% productivity gain. The reduction of IOI elements in querying also impacted the CPU execution time (3.0% – 5.2%). The 4SA also demonstrated data scalability and concurrent user scalability in querying large regions by completing the index searching and concurrent user scalability 1.86% – 3.35% faster than when 4SA was not applied. 
    more » « less
  4. NA (Ed.)
    Abstract. This study investigates the inability of two popular data splitting techniques: train/test split and k-fold cross-validation that are to create training and validation data sets, and to achieve sufficient generality for supervised deep learning (DL) methods. This failure is mainly caused by their limited ability of new data creation. In response, the bootstrap is a computer based statistical resampling method that has been used efficiently for estimating the distribution of a sample estimator and to assess a model without having knowledge about the population. This paper couples cross-validation and bootstrap to have their respective advantages in view of data generation strategy and to achieve better generalization of a DL model. This paper contributes by: (i) developing an algorithm for better selection of training and validation data sets, (ii) exploring the potential of bootstrap for drawing statistical inference on the necessary performance metrics (e.g., mean square error), and (iii) introducing a method that can assess and improve the efficiency of a DL model. The proposed method is applied for semantic segmentation and is demonstrated via a DL based classification algorithm, PointNet, through aerial laser scanning point cloud data. 
    more » « less
  5. Abstract. While data on human behavior in COVID-19 rich environments have been captured and publicly released, spatial components of such data are recorded in two-dimensions. Thus, the complete roles of the built and natural environment cannot be readily ascertained. This paper introduces a mechanism for the three-dimensional (3D) visualization of egress behaviors of individuals leaving a COVID-19 exposed healthcare facility in Spring 2020 in New York City. Behavioral data were extracted and projected onto a 3D aerial laser scanning point cloud of the surrounding area rendered with Potree, a readily available open-source Web Graphics Library (WebGL) point cloud viewer. The outcomes were 3D heatmap visualizations of the built environment that indicated the event locations of individuals exhibiting specific characteristics (e.g., men vs. women; public transit users vs. private vehicle users). These visualizations enabled interactive navigation through the space accessible through any modern web browser supporting WebGL. Visualizing egress behavior in this manner may highlight patterns indicative of correlations between the environment, human behavior, and transmissible diseases. Findings using such tools have the potential to identify high-exposure areas and surfaces such as doors, railings, and other physical features. Providing flexible visualization capabilities with 3D spatial context can enable analysts to quickly advise and communicate vital information across a broad range of use cases. This paper presents such an application to extract the public health information necessary to form localized responses to reduce COVID-19 infection and transmission rates in urban areas. 
    more » « less
  6. Abstract. Classifying objects within aerial Light Detection and Ranging (LiDAR) data is an essential task to which machine learning (ML) is applied increasingly. ML has been shown to be more effective on LiDAR than imagery for classification, but most efforts have focused on imagery because of the challenges presented by LiDAR data. LiDAR datasets are of higher dimensionality, discontinuous, heterogenous, spatially incomplete, and often scarce. As such, there has been little examination into the fundamental properties of the training data required for acceptable performance of classification models tailored for LiDAR data. The quantity of training data is one such crucial property, because training on different sizes of data provides insight into a model’s performance with differing data sets. This paper assesses the impact of training data size on the accuracy of PointNet, a widely used ML approach for point cloud classification. Subsets of ModelNet ranging from 40 to 9,843 objects were validated on a test set of 400 objects. Accuracy improved logarithmically; decelerating from 45 objects onwards, it slowed significantly at a training size of 2,000 objects, corresponding to 20,000,000 points. This work contributes to the theoretical foundation for development of LiDAR-focused models by establishing a learning curve, suggesting the minimum quantity of manually labelled data necessary for satisfactory classification performance and providing a path for further analysis of the effects of modifying training data characteristics. 
    more » « less
  7. Abstract. Point density is an important property that dictates the usability of a point cloud data set. This paper introduces an efficient, scalable, parallel algorithm for computing the local point density index, a sophisticated point cloud density metric. Computing the local point density index is non-trivial, because this computation involves a neighbour search that is required for each, individual point in the potentially large, input point cloud. Most existing algorithms and software are incapable of computing point density at scale. Therefore, the algorithm introduced in this paper aims to address both the needed computational efficiency and scalability for considering this factor in large, modern point clouds such as those collected in national or regional scans. The proposed algorithm is composed of two stages. In stage 1, a point-level, parallel processing step is performed to partition an unstructured input point cloud into partially overlapping, buffered tiles. A buffer is provided around each tile so that the data partitioning does not introduce spatial discontinuity into the final results. In stage 2, the buffered tiles are distributed to different processors for computing the local point density index in parallel. That tile-level parallel processing step is performed using a conventional algorithm with an R-tree data structure. While straight-forward, the proposed algorithm is efficient and particularly suitable for processing large point clouds. Experiments conducted using a 1.4 billion point data set acquired over part of Dublin, Ireland demonstrated an efficiency factor of up to 14.8/16. More specifically, the computational time was reduced by 14.8 times when the number of processes (i.e. executors) increased by 16 times. Computing the local point density index for the 1.4 billion point data set took just over 5 minutes with 16 executors and 8 cores per executor. The reduction in computational time was nearly 70 times compared to the 6 hours required without parallelism. 
    more » « less
  8. null (Ed.)
    Abstract. Each year, lives are needlessly lost to floods due to residents failing to heed evacuation advisories. Risk communication research suggests that flood warnings need to be more vivid, contextualized, and visualizable, in order to engage the message recipient. This paper makes the case for the development of a low-cost augmented reality tool that enables individuals to visualize, at close range and in three-dimension, their homes, schools, and places of work and worship subjected to flooding (modeled upon a series of federally expected flood hazard levels). This paper also introduces initial tool development in this area and the related data input stream. 
    more » « less
  9. null (Ed.)
    Abstract. The massive amounts of spatio-temporal information often present in LiDAR data sets make their storage, processing, and visualisation computationally demanding. There is an increasing need for systems and tools that support all the spatial and temporal components and the three-dimensional nature of these datasets for effortless retrieval and visualisation. In response to these needs, this paper presents a scalable, distributed database system that is designed explicitly for retrieving and viewing large LiDAR datasets on the web. The ultimate goal of the system is to provide rapid and convenient access to a large repository of LiDAR data hosted in a distributed computing platform. The system is composed of multiple, share-nothing nodes operating in parallel. Namely, each node is autonomous and has a dedicated set of processors and memory. The nodes communicate with each other via an interconnected network. The data management system presented in this paper is implemented based on Apache HBase, a distributed key-value datastore within the Hadoop eco-system. HBase is extended with new data encoding and indexing mechanisms to accommodate both the point cloud and the full waveform components of LiDAR data. The data can be consumed by any desktop or web application that communicates with the data repository using the HTTP protocol. The communication is enabled by a web servlet. In addition to the command line tool used for administration tasks, two web applications are presented to illustrate the types of user-facing applications that can be coupled with the data system. 
    more » « less